Regularization and Search for Minimum Error Rate Training
نویسندگان
چکیده
Minimum error rate training (MERT) is a widely used learning procedure for statistical machine translation models. We contrast three search strategies for MERT: Powell’s method, the variant of coordinate descent found in the Moses MERT utility, and a novel stochastic method. It is shown that the stochastic method obtains test set gains of +0.98 BLEU on MT03 and +0.61 BLEU on MT05. We also present a method for regularizing the MERT objective that achieves statistically significant gains when combined with both Powell’s method and coordinate descent.
منابع مشابه
Regularized Minimum Error Rate Training
Minimum Error Rate Training (MERT) remains one of the preferred methods for tuning linear parameters in machine translation systems, yet it faces significant issues. First, MERT is an unregularized learner and is therefore prone to overfitting. Second, it is commonly used on a noisy, non-convex loss function that becomes more difficult to optimize as the number of parameters increases. To addre...
متن کاملA New Generalized Error Path Algorithm for Model Selection
Model selection with cross validation (CV) is very popular in machine learning. However, CV with grid and other common search strategies cannot guarantee to find the model with minimum CV error, which is often the ultimate goal of model selection. Recently, various solution path algorithms have been proposed for several important learning algorithms including support vector classification, Lass...
متن کاملOptimal Search for Minimum Error Rate Training
Minimum error rate training is a crucial component to many state-of-the-art NLP applications, such as machine translation and speech recognition. However, common evaluation functions such as BLEU or word error rate are generally highly non-convex and thus prone to search errors. In this paper, we present LP-MERT, an exact search algorithm for minimum error rate training that reaches the global ...
متن کاملMinimum Error Rate Training and the Convex Hull Semiring
We describe the line search used in the minimum error rate training algorithm (Och, 2003) as the “inside score” of a weighted proof forest under a semiring defined in terms of wellunderstood operations from computational geometry. This conception leads to a straightforward complexity analysis of the dynamic programming MERT algorithms of Macherey et al. (2008) and Kumar et al. (2009) and practi...
متن کاملRegularization by Adding Redundant Features
The Pseudo Fisher Linear Discriminant (PFLD) based on a pseudo-inverse technique shows a peaking behaviour of the generalization error for training sample sizes that are about the feature size: with an increase in the training sample size the generalization error at first decreases reaching the minimum, then increases reaching the maximum at the point where the training sample size is equal to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008